Skip to content

Clustering

In basis it is required to manage docker container running on multiple hosts in a distributed enviroment

  • it helps to scale docker container effectively
  • solution for enterprice
  • there is a manager host and more agent on other host that will join in the Cluster
  • each software or implementation of the clustering shouldn't violate the docker client principles, it will be possible using CLI to perform the basic docker command in the cluster.
  • when is necessary realize a cluster (or perform scaling ops) is necessary keep in mind some prerequisite that linux OS needs, as right permission, volume handling and so on

Using docker swarm for clustering

  1. Grab the latest version with the command:
docker pull swarm
  1. create a simple swarm
docker run --rm swarm create

This process will return the token id that will be used by the other nodes that will be involving in the cluster. It is unique identifier in the cluster and it could be used also to grab information about the cluster (how many nodes are defined into the cluster and so on).

NOTE

When is necessary use the token (to discover services for example) in necessary use only the first 5 digits.

  1. start to create a new node into the created cluster wuth the command
sudo docker -H tcp://<host_id>:<port_id> -d &

So now the new node with the IP above will be the default socket and the daemon will try to create the node in the cluster and waiting for connection. The complexity of this operation is that is necessary manage the docker socket and perform some dangerous operation, that can be skipped simply using docker swarm

NOTE

Before create the cluster there are a list of checks that need to be reviewed before to create the swarm as example verify that TLS is enabled and other stuff. see docker checks

Obtain list of nodes in the cluster

Using the token id is very simple retreive the nodes defined in the cluster:

docker run --rm swarm list token://<token_id>

Discovery services

There are some algorithmics that allow a node to join in a exsisting cluster. In order to discover a cluster for the defined node is possible invoke the command:

docker run swarm join --advertise=<host_id>:<port_id> token://<token_id>

Is so possible verify if the node master with the given id is ready to manage other nodes that should be added in the cluster in according with this reference.

Using filters with docker swarm

  • Filters are used to schedule containers on a subset of nodes
  • they are an important part of the docker swarm manager command
  • there are several filters flag that can be used along with the swarm command to manage a particular kind of nodes.
  • some example: deploy a container on nodes that have ssd as storage drive, or are located on the east coast of US, or are production nodes, and so on.

NOTE

if more than one node are selected during the filter operation, docker manage will select one random nodes from these result list.

Filter types:

  • constraint
  • key/value associated to a particular nodes

```sh docker run -d -P -e constraint:storage=ssd --name test nginx

```

  • affinity
  • attach container based on a given label, id or name
  • we can instruct docker to run a container in a node and the next in another specified node. This could allow that the containers are all in the same network.

Example

  • Spawn the first container (called frontend) as usual:
docker run -d -p 80:80 --name nginx_server nginx
  • Generate the next container defining an affinity with the previous container created: sh docker run -d --name second_container -e affinity:container==nginx_server nginx

  • port

  • node selection based on a specific port
  • i.e. spawn containers that have the port 80 exposed. Of course these containers are defined on multiple host in the cluster
docker run -d -p 80:80 --name nginx_server nginx

NOTE

if we try to run this command more than one time on a single node, at the second time docker reply with an error message because on the single node there are already defined a container that use the port 80. But if we try this command on a cluster with several nodes, the port filters allow us that the docker swarm engine will checks if there is another node in the cluster that have the 80 available and is not already occupied by a container.

  • dependency
  • node selection based on a given dependent volume or link
docker run -d -P --link=dependency:db1 --name db2 nginx

Here we assume there is an exsisting container called db1, and docker swarm engine will try to spawn the new container db2 on the same node where are defined db1 because db2 have a dependency from db1.

NOTE

if docker can't resolve the dependency (for example because the db1 container doesn't exsist), the db2 creation is stopped and the container is not created.

  • health
  • scheduling containers on unhealthy nodes
  • standard filters provided by docker itself out of the box (node ID/node name, storagedriver, kernelversion and so on)

Running a container using a filter

For example I want launch nginx container on nodes of a given cluster that have ssd storage drive:

NOTE

  • -d flag is for launch the container as daemon
  • -P flag is for the port
  • -e for the constraint (with the assumption that exsist a label or property for the constraint define)ù
  • -m flat to associate an amount of memory

Swarm strategies

  • Swarming on a cluster is a complex operation under the hood, but from the docker prospective is just an API that could be used to define our infrastructure
  • Docker swarm support internally multiple strategy to allocate containers that are partecipating in a cluster.
  • Strategies are algorithmics that determine the score to rank a node in a cluster
  • Depending on the strategy applied a container can be defined on a node or on a different one.
  • The container will be defined on the node with the maximum score, based on a given strategy.
  • The flag --swarm-strategy < binpack | random | spread > should be used when the node is created in the cluster
  • Actually there are three types of strategies:

  • BinPack:

    • favor nodes that are running the maximum number of containers.
    • this strategy allow to use at max as possible the space/memory capacity of a node before create or use another one
  • Random:
    • is more simple but could became very dangerous if the selected container is full
    • could be not used in production enviroment
  • Spread:
    • it allow to select only the nodes with the minumum number of running containers. This could be a good strategy to minimize the effort necessary to restart containers on a node that for some reasons crash.